Skip to main content

LangSmithLoader

This notebook provides a quick overview for getting started with the LangSmithLoader. For detailed documentation of all LangSmithLoader features and configurations head to the API reference.

Overview

Integration details

ClassPackageLocalSerializablePY support
LangSmithLoader@langchain/communitybeta

Loader features

SourceWeb LoaderNode Envs Only
LangSmithLoader

FireCrawl crawls and convert any website into LLM-ready data. It crawls all accessible sub-pages and give you clean markdown and metadata for each. No sitemap required.

FireCrawl handles complex tasks such as reverse proxies, caching, rate limits, and content blocked by JavaScript. Built by the mendable.ai team.

This guide shows how to scrap and crawl entire websites and load them using the LangSmithLoader in LangChain.

Setup

To access the LangSmith document loader you’ll need to install @langchain/core, create a LangSmith account and get an API key.

Credentials

Sign up at https://langsmith.com and generate an API key. Once you’ve done this set the LANGSMITH_API_KEY environment variable:

export LANGSMITH_API_KEY="your-api-key"

Installation

The LangSmithLoader integration lives in the @langchain/core package:

yarn add @langchain/core

Create example dataset

For this example, we’ll create a new dataset which we’ll use in our document loader.

import { Client as LangSmithClient } from 'langsmith';
import { faker } from "@faker-js/faker";

const lsClient = new LangSmithClient();

const datasetName = "LangSmith Few Shot Datasets Notebook";

const exampleInputs = Array.from({ length: 10 }, (_, i) => ({
input: faker.lorem.paragraph(),
}));
const exampleOutputs = Array.from({ length: 10 }, (_, i) => ({
output: faker.lorem.sentence(),
}));
const exampleMetadata = Array.from({ length: 10 }, (_, i) => ({
companyCatchPhrase: faker.company.catchPhrase(),
}));

await lsClient.deleteDataset({
datasetName,
})

const dataset = await lsClient.createDataset(datasetName);

const examples = await lsClient.createExamples({
inputs: exampleInputs,
outputs: exampleOutputs,
metadata: exampleMetadata,
datasetId: dataset.id,
});
import { LangSmithLoader } from "@langchain/core/document_loaders/langsmith"

const loader = new LangSmithLoader({
datasetName: "LangSmith Few Shot Datasets Notebook",
// Instead of a datasetName, you can alternatively provide a datasetId
// datasetId: dataset.id,
contentKey: "input",
limit: 5,
// formatContent: (content) => content,
// ... other options
})

Load

const docs = await loader.load()
docs[0]
{
pageContent: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.',
metadata: {
id: 'f1a04800-6f7a-4232-9743-fb5d9029bf1f',
created_at: '2024-08-20T17:01:38.984045+00:00',
modified_at: '2024-08-20T17:01:38.984045+00:00',
name: '#f1a0 @ LangSmith Few Shot Datasets Notebook',
dataset_id: '9ccd66e6-e506-478c-9095-3d9e27575a89',
source_run_id: null,
metadata: {
dataset_split: [Array],
companyCatchPhrase: 'Integrated solution-oriented secured line'
},
inputs: {
input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'
},
outputs: {
output: 'Excepturi adeptio spectaculum bis volaticus accusamus.'
}
}
}
console.log(docs[0].metadata)
{
id: 'f1a04800-6f7a-4232-9743-fb5d9029bf1f',
created_at: '2024-08-20T17:01:38.984045+00:00',
modified_at: '2024-08-20T17:01:38.984045+00:00',
name: '#f1a0 @ LangSmith Few Shot Datasets Notebook',
dataset_id: '9ccd66e6-e506-478c-9095-3d9e27575a89',
source_run_id: null,
metadata: {
dataset_split: [ 'base' ],
companyCatchPhrase: 'Integrated solution-oriented secured line'
},
inputs: {
input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'
},
outputs: { output: 'Excepturi adeptio spectaculum bis volaticus accusamus.' }
}
console.log(docs[0].metadata.inputs)
{
input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'
}
console.log(docs[0].metadata.outputs)
{ output: 'Excepturi adeptio spectaculum bis volaticus accusamus.' }
console.log(Object.keys(docs[0].metadata))
[
'id',
'created_at',
'modified_at',
'name',
'dataset_id',
'source_run_id',
'metadata',
'inputs',
'outputs'
]

API reference

For detailed documentation of all LangSmithLoader features and configurations head to the API reference


Was this page helpful?


You can also leave detailed feedback on GitHub.